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Abstract 

We consider a general class of forecasting protocols, called "linear pro- 
tocols", and discuss several important special cases, including multi-class 
forecasting. Forecasting is formalized as a game between three players: 
Reality, whose role is to generate observations; Forecaster, whose goal is 
to predict the observations; and Skeptic, who tries to make money on any 
lack of agreement between Forecaster's predictions and the actual obser- 
vations. Our main mathematical result is that for any continuous strategy 
for Skeptic in a linear protocol there exists a strategy for Forecaster that 
does not allow Skeptic's capital to grow. This result is a meta-theorem 
that allows one to transform any continuous law of probability in a lin- 
ear protocol into a forecasting strategy whose predictions are guaranteed 
to satisfy this law. We apply this meta-theorem to a weak law of large 
numbers in Hilbert spaces to obtain a version of the K29 prediction al- 
gorithm for linear protocols and show that this version also satisfies the 
attractive properties of proper calibration and resolution under a suitable 
choice of its kernel parameter, with no assumptions about the way the 
data is generated. 



1 Introduction 

In we suggested a new methodology for designing forecasting strategies. 
Considering only the simplest case of binary forecasting, we showed that any 
constructive, in the sense explained below, law of probability can be translated 
into a forecasting strategy that satisfies this law. In this paper this result 
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is extended to a general class of protocols including multi-class forecasting. In 
proposing this approach to forecasting we were inspired by 0] and papers further 
developing [I] , although our methods and formal results appear to be completely 
different. 

Whereas the meta-theorem stated in ^1] is mathematically trivial, the gener- 
alization considered in this paper is less so, depending on the Schauder-Tikhonov 
fixed-point theorem. Our general meta-theorem is stated in 21 and proved in 21 
and Appendix [X] The general forecasting protocols covered by this result are 
introduced and discussed in §[0121 

In ^3] we demonstrated the value of the meta-theorem by applying it to 
the strong law of large numbers, obtaining from it a kernel forecasting strategy 
which we called K29. The derivation, however, was informal, involving heuristic 
transitions to a limit, and this made it impossible to state formally any prop- 
erties of K29. In this paper we deduce K29 in a much more direct way from 
the weak law of large numbers and state its properties. (For binary forecasting, 
this was also done in - and the reader might prefer to read that paper first.) 
The weak law of large numbers is stated and proved in jj^J and K29 is derived 
and studied in jJBJ 

We call the approach to forecasting using our meta-theorem "defensive fore- 
casting" : Forecaster is trying to defend himself when playing against Skeptic. 
The justification of this approach given in this paper and in ^3] is K29's prop- 
erties of proper calibration and resolution. Another justification, in a sense the 
ultimate justification of any forecasts, is given in 1 2 : defensive forecasts lead 
to good decisions; this result, however, is obtained in for rather simple de- 
cision problems requiring only binary forecasts, and its extensions will require 
this paper's results or their generalizations. 

The exposition of probability theory needed for this paper is given in [§]. 
The standard exposition is based on Kolmogorov's measure-theoretic axioms 
of probability, whereas |5] states several key laws of probability in terms of a 
game between the forecaster, the reality, and a third player, the skeptic. The 
game-theoretic laws of probability in [5] are constructive in that we explicitly 
construct computable winning strategies for the forecaster in various games of 
forecasting. 

2 Forecasting as a game 

Following [5] and we consider the following general forecasting protocol: 

Forecasting Game 1 

Players: Reality, Forecaster, Skeptic 

Parameters: X (data space), Y (observation space), F (Forecaster's move 
space), S (Skeptic's move space), A:SxFxY— > R (Skeptic's gain 
function and Forecaster's loss function) 

Protocol: 
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/Co := 1. 

FOR n — 1,2,...: 

Reality announces x n G X. 

Forecaster announces f n G F. 

Skeptic announces s n G S. 

Reality announces y n G Y. 

/C n := /Cri-i + A(s„, / n , y n ). 
END FOR 

Restriction on Skeptic: Skeptic must choose the s n so that his capital is 
always nonnegative (JC n > for all n) no matter how the other players move. 

This is a perfect-information protocol: the players move in the order indicated, 
and each player sees the other player's moves as they are made. It specifics 
both an initial value for Skeptic's capital (/Co = 1) and a lower bound on its 
subsequent values (/C„ > 0). We will say that the data, y n are the 

observations, and /„ are the forecasts. In applications, the datum x n will contain 
all available information deemed useful in forecasting y n . 

Book [5j contains several results (game-theoretic versions of limit theorems of 
probability theory) of the following form: Skeptic has a strategy that guarantees 
that either a property of agreement between the forecasts /„ and observations 
y n is satisfied or Skeptic becomes very rich (without risking bankruptcy, accord- 
ing to the protocol). All specific strategies considered in ||] have computable 
versions. According to Brouwer's principle (see, e.g., §1 of JU| f° r a recent re- 
view of the relevant literature) they must be automatically continuous; in any 
case, their continuity can be checked directly. In |2j we showed that, under 
a special choice of the players' move spaces and Skeptic's gain function A, for 
any continuous strategy for Skeptic Forecaster has a strategy that guarantees 
that Skeptic's capital never increases when he plays that strategy. Therefore, 
Forecaster has strategies that ensure various properties of agreement between 
the forecasts and the observations. 

The purpose of this paper is to extend the result of [21 to a wide class of 
Skeptic's gain functions A. But first we consider several important special cases 
of Forecasting Game 1. 

Binary forecasting 

The simplest non-trivial case, considered in ^1], is where Y = {0, 1}, F = [0, 1], 
S = R, and 

A(s„, f n , Vn) = SniVn ~ fn)- (1) 

Intuitively, Forecaster gives probability forecasts for y n : f n is his subjective 
probability that y n = 1. The operational interpretation of /„ is that it is the 
price that Forecaster charges for a ticket that will pay y n at the end of the nth 
round of the game; s n is the number (positive, zero, or negative) of such tickets 
that Skeptic chooses to buy. 
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Bounded regression 

This is the most straightforward extension of binary forecasting, considered in 
UJ, §3.2. The move spaces are Y = F = [A, B], where A and B are two 
constants, and S = K; the gain function is, as before, (JIJ. This protocol allows 
one to prove a strong law of large numbers (UJ, Proposition 3.3) and a simple 
one-sided law of the iterated logarithm Corollary 5.1). 

Multi-class forecasting 

Another extension of binary forecasting is the protocol where Y is a finite set, 
F is the set of all probability distributions on Y, S is the set of all real-valued 
functions on Y, and 



A(s n , fmUn) — 




The intuition behind Skeptic's move s n is that Skeptic buys the ticket which 
pays s n (y n ) after y n is announced; he is charged / s n df n for this ticket. 

The binary forecasting protocol is "isomorphic" to the special case of this 
protocol where Y = {0, 1}: Forecaster's move /„ in the binary forecasting 
protocol is represented by the probability distribution f' n on {0, 1} assigning 
weight /„ to {1} and Skeptic's move s n in the binary forecasting protocol is 
represented by any function s' n on {0, 1} such that s' n (l) — s' n (0) — s n . The 
isomorphism between these two protocols follows from 

s' n (l)f n -s' n (0)(l-f n ) 

= s'niVn) ~ ~ s nfn = S n (y n ~ /„) 

(remember that y n £ {0, 1}). 

Bounded mean-variance forecasting 

In this protocol, Y = [A, B], where A and B are again two constants, F = S = 
R 2 , and 

A(s„, f n ,y n ) = A((M„,K), (m n ,v n ),y n ) = M n (y n -m n ) +V n ((y n -m n ) 2 -v n ). 

Intuitively, Forecaster is asked to forecast y n with a number m n and also forecast 
the accuracy {y n — m n ) 2 of his first forecast with a number v n . This protocol, 
although usually without the restriction y n £ [A,B], is used extensively in 9 
(e.g., in Chaps. 4 and 5). 

An equivalent representation of this protocol is Y = {(t,t 2 ) \ t £ [A, B]}, 
F = S = K 2 and 

A(s n , fn,y n ) = H( s 'n> S n)> (/n)/n)) (*n>*n)) = s n(*n — f'n) + — f'n)- 
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The equivalence of the two representations can be seen as follows: Reality's 
move (x n ,t n ) in the first representation corresponds to (x n ,y n ) = (x n , (t n ,t^)) 
in the second representation, Forecaster's move (m n ,v n ) in the first repre- 
sentation corresponds to (/^,/") = {nn ni v n + m^) in the second representa- 
tion, and Skeptic's move (s^,s") in the second representation corresponds to 
(M n ,V n ) — (s' n + 2m n s",s") in the first representation. This establishes a 
bijection between Reality's move spaces, a bijection between Forecaster's move 
spaces, and a bijection between Skeptic's move spaces in the two representations; 
Skeptic's gains are also the same in the two representations: 



3 Linear protocol 

Forecasting Game I is too general to derive results of the kind we are interested 
in. In this subsection we will introduce a narrower protocol which will still be 
wide enough to cover all special cases considered so far. 

All move spaces are now subsets of a Hilbert space L (we allow L to be 
non-separable or finite-dimensional; in fact, in this paper we emphasize the case 
where L = R m for some positive integer m). The observation space is a non- 
empty pre-compact subset YcL (we say that a set is pre- compact if its closure 
is compact; if L = W n , this is equivalent to it being bounded), Forecaster's 
move space F is the whole of L, and Skeptic's move space S is also the whole 
of L. Skeptic's gain function is 



Therefore, we consider the following perfect-information game: 

Forecasting Game 2 

Players: Reality, Forecaster, Skeptic 

Parameters: X, L (Hilbert space), Y (non-empty pre-compact subset of L) 
Protocol: 
JCo := 1. 

FOR n = 1,2,...: 

Reality announces x n £ X. 
Forecaster announces /„ £ L. 
Skeptic announces s n £ L. 
Reality announces y n £ Y. 

K, n := Kn-i + (s n ,y n - /t»)l- (2) 
END FOR 

Restriction on Skeptic: Skeptic must choose the s n so that his capital is 
always nonnegative no matter how the other players move. 



s n(tn f n ) + s n {^n fn) 

= s' n (t n - m n ) + s„(((t n - m n ) 2 + 2{t n - m n )m n + ml) 
= (< + 2m„<J(t„-m„) + <((£, 



n 



(v n +ml)j 
m n f - v n ). 



^{Sn-> fn-> Un) — ^nfUn fn) Jj • 
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Let us check that the specific protocols considered in the previous section are 
covered by this linear protocol (and for all those protocols L can be taken finite 
dimensional, L = W n for some m £ {1, 2, . . .}). At first sight, even the binary 
forecasting protocol is not covered, as Forecaster's move space is F = [0, 1] rather 
than HL It is easy to see, however, that Forecaster's move /„ ^ co Y outside 
the convex closure co Y of the observation space (the convex closure co A of a 
set A is defined to be the intersection of all convex closed sets containing A) 
is always inadmissible, in the sense that there exists Skeptic's reply s„ making 
him arbitrarily rich regardless of Reality's move, and so we can as well choose 
F := coY. Indeed, suppose that /„ ^ co Y in the linear protocol. Since Y 
is pre-compact, co Y is compact (|Sj, Theorem 3.20(c)). By the Hahn-Banach 
theorem (|S], Theorem 3.4(b)), there exists a vector s n € L such that 

inf (s n ,y- /„> L > 0. 

(It would have been sufficient for either {/„} or co Y to be compact; in fact 
both are.) Skeptic's move Cs n can make him as rich as he wishes as C can be 
arbitrarily large. In what follows, we will usually assume that Forecaster's move 
space is co Y and use F as a shorthand for co Y. 

Now it is obvious that the binary forecasting, bounded regression, and 
bounded mean-variance forecasting (in its second representation) protocols are 
special cases of the linear protocol (perhaps with F = co Y). For the multi-class 
forecasting protocol, we should represent Y as the vertices 

y 1 := (1,0,0,..., 0), y 2 := (0,1,0,.. .,0),..., y m := (0, 0, 0, . . . , 1) 

of the standard simplex in R m , where m is the size of Y, represent the proba- 
bility distributions / on Y as vectors (fly 1 }, ■ ■ ■ , f{y m }) in K m , and represent 
the real- valued functions s on Y as vectors (s(y 1 ), ■ ■ ■ , s(y m )) in R m . 

4 Meta-theorem 

In this section we state the main mathematical result of this paper: for any 
continuous strategy for Skeptic there exists a strategy for Forecaster that does 
not allow Skeptic's capital to grow, regardless of what Reality is doing. As in 
[Tl] , we make Skeptic announce his strategy for each round at the outset of that 
round rather than announce his strategy for the whole game at the beginning 
of the game, and we drop all restrictions on Skeptic. Forecaster's move space is 
restricted to F = co Y. The resulting perfect-information game is: 

Forecasting Game 3 

Players: Reality, Forecaster, Skeptic 

Parameters: X, L (Hilbert space), YcL (non-empty and pre-compact) 
Protocol: 

/Co is set to a real number. 

FOR n — 1,2,...: 



G 



Reality announces x n £ X. 
Skeptic announces continuous S n : co Y 
Forecaster announces /„ G co Y. 
Reality announces y n € Y. 
/C„ := /C„_i + {S n (f n ),y n — / n )L- 
END FOR 



Theorem 1 Forecaster has a strategy in Forecasting Game 3 that ensures K-o > 
A > £ 2 > • • • . 

Proof Fix a round n and Skeptic's move : F — > L (we will refer to 5 n as 
a vector field in F). Our task is to prove the existence of a point /„ £ F such 
that, for all y £ Y, (S n {f n ),y- /„) L < 0. 

If for some / £ OF (we use dA to denote the boundary of A C L) the vector 
S n (f) is normal and directed exteriorly to F (in the sense that (S n (f), y — /)l < 
for all y £ F), we can take such / as f n . Therefore, we assume, without loss 
of generality, that S n is never normal and directed exteriorly on dF. Then by 
Lemma n in Appendix 1X1 there exists / such that S n (f) — 0, and we can take 
such / as /„. I 

Remark Notice that Theorem ^ will not become weaker if the first move by 
Reality (choosing x n ) is removed from each round of the protocol. 



5 A weak law of large numbers in Hilbert space 

Unfortunately, the usual law of large numbers is not useful for the purpose of 
designing forecasting strategies (see the discussion in Therefore, we state 

a generalized law of large numbers; at the end of this section we will explain 
connections with the usual law of large numbers. In this section we consider 
Forecasting Game 2 without the requirement /Co — 1 and with the restriction 
on Skeptic dropped. If we fix a strategy for Skeptic and Skeptic's initial capital 
/Co (not necessarily 1 or even a positive number), ZC„ defined by J5J) becomes 
a function of Reality's and Forecaster's moves. Such functions will be called 
capital processes. 

Let $ : F x X -> H (as usual, F = co Y) be a feature mapping into a Hilbert 
space H; H is called the feature space. The next theorem uses the notion of 
tensor product; for details, see Appendix iBl 

Theorem 2 The function 



-J2\\vi-MlMfi,*i)\\* (3) 



is a capital process (not necessarily non-negative) of some strategy for Skeptic. 
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Proof We start by noticing that 



^2(Vi - fi) ® $(fi,%i) + (Vn ~ /») ® ${fn,X n ) 



i=l 
n-l 



-||j/n-/n|lLll*(/n,a:„C 



i=l 
n-l 

\<=i / 

71-1 



(in the last two equalities we used (|18fl and 1191) from Appendix[BJ). Introducing 
the notation 

k((/, x), (/', z')) := (#(/, x), $(/', x'))h, (4) 
where (/, a;), (/', x') € F x X, we can rewrite the expression for }C n — /C„_i as 

k((/ J ,x. i ), {f n ,x n ))(yi - fi),y„ - . 

Therefore, /C„ is the capital process corresponding to Skeptic's strategy 
n-l 



2^k((/»,a:»)) (fn,x n ))(y t - fi); 



(5) 



i=l 



this completes the proof. I 
More standard statements of the weak law 

In the rest of this section we explain connections of Theorem [21 with more 
standard statements of the weak law of large numbers; in this part of the paper 
we will use some notions introduced in The rest of the paper does not 
depend on this material, and the reader may wish to skip this subsection. 
Let us assume that 

c,j> := sup ||$(/,a;)|| H < oo. 

(/,X)6FXX 

We will use the notation diam(Y) := sup y y , € ^- \\y — y'\\ L ; it is clear that 
diam(Y) < oo. For any initial capital /Co, 



lC n :— /Co 



$3(tfi-/i)®*(/i,*0 



-£lb«-/i|£lW^)llH 
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is the capital process of some strategy for Skeptic. Suppose a positive integer 
N (the duration of the game, or the horizon) is given in advance and /Co := 
diam 2 (Y)c| N. Then, in the game lasting N rounds, K, n is never negative and 



If we do not believe that Skeptic can increase his capital 1/5- fold for a small 
5 > without risking bankruptcy, we should believe that 



J2iVi- fi)®$(fi,Xi) 



< diam 2 (Y)c|iV/(5, 



which can be rewritten as 



1 N 



1 = 1 



< diam(Y) C4 >(iV<5) 



-1/2 



(6) 



In the terminology of the game-theoretic lower probability of the event 
is at least 1 — 5. 

The game-theoretic version of Bernoulli's law of large numbers is a special 
case of © corresponding to $(/, x) = 1, for all / and x, Y = {0, 1}, and |X| = 1 
(the last two conditions mean that we are considering the binary forecasting 
protocol without the data); as usual, we assume that fa are chosen from co Y = 
[0, 1]. As explained in in combination with the measurability of Skeptic's 
strategy guaranteeing © , this implies that the measure-theoretic probability of 
the event ®) is at least 1 — 5, assuming that the are generated by a probability 
distribution and that each /j is the conditional probability that yi = 1 given 
j/i, . . . , Ui-i- This measure-theoretic result was proved by Kolmogorov in 1929 
(see and is the origin of the name "K29 strategy". 

We will see in the next section that the feature-space version © of the weak 
law of large numbers is much more useful than the standard version for the 
purpose of forecasting. 



6 The K29 strategy and its properties 

According to Theorem ^ under the continuity assumption there is a strategy 
for Forecaster that does not allow IC n to grow, where JC n is defined by ©. 
Fortunately (but not unusually), this strategy depends on the feature mapping 
$ only via the corresponding kernel k defined by Q . The continuity assumption 
needed is that k((f,x),(f',x')) should be continuous in /; such kernels will 
be called admissible. According to J5J, the corresponding forecasting strategy, 
which we will call the K29 strategy with parameter k, is to output, on the nth 
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round, a forecast /„ satisfying 

n-l 

S(f n ) := KM,*), (fn,x n ))(yi -fi) = 
»=i 

(or, if such /„ does not exist, the forecast is chosen to be a point /„ £ <9F where 
S(f n ) is normal and directed exteriorly to F). 

The protocol of this section is essentially that of Forecasting Game 3; as 
Skeptic ceases to be an active player, it simplifies to: 

FOR n = 1,2,...: 

Reality announces x n £ X. 

Forecaster announces /„ £ co Y. 

Reality announces y n £ Y. 
END FOR 



Theorem 3 The K29 strategy guarantees that always 



!=1 



< diam(Y)c$v^i 



(7) 



LigiH 



where c$ := sup (7 . t )<=fxx II^K/i ^OIIh * s assumed to be finite. 
Proof The K29 strategy ensures that © never increases; therefore, 

2 



< Z>< - /illL *i)llH < diam 2 (Y) C 2 

L®H i=l 



Remark The property Q is a special case of (jHJ corresponding to S = 1; we 
gave an independent derivation to make our exposition self-contained and to 
avoid the extra assumptions used in the derivation of ©, such as the horizon 
being finite and known in advance. 



K29 with reproducing kernel Hilbert spaces 

A reproducing kernel Hilbert space (usually abbreviated to RKHS) is a function 
space T on some set Z such that all evaluation functional F £ T ^ F(z), 
z £ Z, are continuous. We will be interested in RKHS on the Cartesian product 
F x X. 

By the Riesz-Fischer theorem, for each z £ Z there exists a function k 2 £ T 
such that 

Let 

cjr := sup ||k^|| ^ ; (8) 

zez 
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we will be interested in the case Cjr < oo. 
The kernel of an RKHS T on Z is 



k(z, z) := (k z ,k z /) jr 



(9) 



(equivalently, we could define k(z, z') as k 2 (z') or as k z '(.z)). It is clear that 
is a special case of the generalization 



k(z,z') := ($(*), $(*0>H 



(10) 



of In fact, the functions k that can be represented as (|1C)|> are exactly the 
functions that can be represented as 10; they can be equivalently defined as 
symmetric positive definite functions on Z 2 (see ^3] for a list of references). 

A long list of RKHS together with their kernels is given in §7.4. We will 
only give one example: the Sobolev space S of absolutely continuous functions 
F on K with finite norm 



\F\\s--= 



F 2 {z)Az + / (F'(z)) 2 dz; 



(11) 



its kernel is 



k(z,z')^exp(-|z-z'|) 



(see 53 or 0; §7-4, Example 24). From the last equation we can see that 
c 5 = l/y/2. ' 

The following is an easy corollary of Theorem [3] 

Theorem 4 Let T he an RKHS on F x X. The K29 strategy with parameter 
k (defined by ensures 



jr,F(fi,Xi)(yt-fi) 



»=i 



< diam(Y)c^ 



(12) 



for each function F G T ' , where Cjr is defined by 

H := T be defined by $(^) := k z . Theorem |3 then 



Proof Let $ : F x X 

implies 



£<k /<lie4 ,J^ H (» i -/0 

X)((i/i-/i)® k /i,*0 F 



< 



X](^ ~ /») ® k /i,xi 



< diam(Y)c^ H-FH^v/n 



(the second equality follows from Lemma [21 and the first inequality from 
Lemma |21 in Appendix |B|) . I 



11 



Calibration and resolution 



Two important properties of a forecasting strategy are its calibration and reso- 
lution, which we introduce informally. Our discussion in this section extends the 
discussion in J3|> §5, to the case of linear protocols (in particular, to the case 
of multi-class forecasting). Forecaster's move space is assumed to be F = co Y. 
We say that the forecasts f n are properly calibrated if, for any /* G F, 

Hi=l,...,n:fj&f Vi _ ,* 

provided Yli=i n-fi&f* ^ 1S n0 ^ ^ 00 smau - (We shorten (l/c)v to v/c, where 
v is a vector and c ^ is a number.) Proper calibration is only a necessary 
but far from sufficient condition for good forecasts: for example, a forecaster 
who ignores the data be perfectly calibrated, no matter how much useful 

information x n contain. (Cf. the discussion in |3].) 

We say that the forecasts /„ are properly calibrated and resolved if, for any 

(AOeFxX, 



Etl,...,n:(/i,ii)s;(/*,i*) Hi 



r (13) 



£i=l,...,n:(/i,Xi)«(/*,x*) 1 
provided J2i=l,...,n:(fi,Xi)R}(f*,x*) 1 is not t0 ° smau - 

Instead of "crisp" points (f*,x*) € F x X one may consider "fuzzy points" 
I : F x X -> [0, 1] such that I(f*,x*) = 1 and /(/, x) = for all (/, x) outside 
a small neighborhood of (f*,x*). A standard choice would be something like 
I := Ie, where E C F x X is a small neighborhood of (f*,x*) and Ie is its 
indicator function, but we will want / to be continuous (it can, however, be 
arbitrarily close to Ie). 

Suppose F C R m and X C R l for some m,l € {1,2,...}. Let (/*,ar*) 
be a point in F x X; consider a small box E :— Ili™i[ a i:^] x IIj=i[ c .j> 
containing this point, E 3 (f*,x*). The indicator Ig of E can be arbitrarily 
well approximated by the tensor product 

m I 

/(A, .. . ,f m , xi , . . .,xi) = n wo n 

i=l j=l 

of some functions Fi and Gj from the Sobolev class 1|11[1. Let \\IWj? be the norm 
of / in the tensor product T of m + I copies of S (see pQ, §1.8, for an explicit 
description of tensor products of RKHS). We can rewrite 112|) as 



- v; ; /(/,.,,) (14) 



(assuming the denominator X)"=i -^(/») x i) ^ s positive); therefore, we can expect 
proper calibration and resolution in the soft neighborhood I of (/*,£*) when 

53/(/i,3Bi)»Vn. (15) 
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7 Further research 



The main result of this paper is an existence theorem: we did not show how 
to compute Forecaster's strategy ensuring Kq > JC\ > ■ ■ ■ . (The latter was 
easy in the case of binary forecasting considered in |14|.) It is important to 
develop computationally efficient ways to find zeros of vector fields, at least 
when L = R m . There are several popular methods for finding zeros, such as 
the Newton- Raphson method (see, e.g., Chap. 9), but it would be ideal to 
have efficient methods that are guaranteed to hnd a zero (or a near zero) in a 
prespecified time. 
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A Zeros of vector fields 

The following lemma is the main component of the proof of Theorem ^ 

Lemma 1 Let F be a compact convex non-empty set in a Hilbert space L and 
S : F — > L be a continuous vector field on F. If at no point of the boundary dF 
the vector field S is normal and directed exteriorly to F then there exists f G F 
such that S(f) = 0. 

Proof For each / € L define cr(f) to be the point of F closest to /. A standard 
argument (see, e.g., Theorem 12.3) shows that such a point exists: if d := 
inf{||y — /|| L | y £ F}, we can take any sequence y n £ F with \\y n — /|| L — > d 

\a + b\\ 2 = 2||a|| 2 + 2||6|| 2 to obtain 



and apply the parallelogram law \\a — b\\ 

\\Vm - 2M|Il = IKftn - f) - (Vn - f)\\ 



= 2||lfo 

= 2\\Vn 
< 2 lift, 



/I 
/I 

/111 



2 lift 
2 lift 

2 hn 



f\\l 
/IIl 

/111 



\(Vr, 



4 
Ad 2 



JJn 



/) + (u 
f v 



2 

► 2d 2 



-f 

2d 2 



f)\\ 



Ad 2 = 



asm,!i->oo; since L is complete and F is closed, y n — > y for some t/6F, and 
it is clear that ||y — /|| L = d. A closest point is indeed unique: if \\y\ — /|| L = 
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\l/2 — f\\i, = d and yi ^ y2, the parallelogram law would give 

yi + V2 

L 4 

1 



\\\(Vl-f) + (V2-f)\\l 



^^Wyi-fWl + ^-fWl-^ II (2/1 - /) - (y 2 - /)IIl 

= rf 2 - \\\yi~y2\\l<d 2 



(16) 



Therefore, the function cr(f) is well-defined. It is also continuous: if 

ll/- cr (/)llL = d and /« -> /) then 11/ - (7 (/»)IIl d and ' analogously 
to (EH), 



d 2 < 



*(/) + *(/») 



-f 



= JllW/)-/) + W/n)-/)llL 



^K/)-/IIl + ^K/«)-/II 



JlIK/)-/)- 

= d 2 + o(i)- |lk(/)-o-(/n)||?.; 



therefore, a(f n ) — > cr(/) in L. 

For each / e F, let £(/) := a(f+S(f)) be the point of F closest to f + S(f); 
since both ct and S are continuous, E is continuous. By the Schauder-Tikhonov 
theorem (see, e.g., [S], Theorem 5.28) there is a point / G F such that £(/) = /. 
If / is an interior point of F, a(f + S(f)) = f implies S(f) = 0, and so the 
conclusion of the lemma holds. It remains to consider the case / £ dF; in 
fact, we will show that this case is impossible. There exists y £ F such that 
(S(f), y— /)l > (otherwise, S would have been normal and directed exteriorly 
to F), and we find for t £ (0, 1): 



||(/ + S(f)) - ((1 - t)f + ty)\\l = \\S(f) t(y -f)\\ 2 L 



\\S(f)\\l-2t(S(f),y- f^ + fWy- f\\l; 



for a small enough t this gives 



||(/ + 5(/))-((l-i)/ + t 2 ;)||^<||5(/)| 12 



L > 



a contradiction. 



B Tensor product 

In this appendix we list several definitions and simple facts about tensor prod- 
ucts of Hilbert spaces, in the form used in this paper. 

The tensor product L (£> H of Hilbert spaces L and H is defined in, e.g., 0, 
§11.4. Briefly, the definition is as follows. The space L gj H is the subset of the 
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set of bilinear forms v(l', h'), (' £ L and h' G H, obtained as the completion of 
the set of all linear combinations of the bilinear forms I <g> h, where I G L and 
h G H, defined by 

Q®h)(l',h') := (U')l<M'}h; (17) 
the inner product in L ® H is determined uniquely by setting 

(ii (8 fti, / 2 ® ft-2) L8H : = (*i> k} L (hi,h 2 ) n . (18) 
In particular, l|18|) implies 

I|z®/*IIl®h = IKIUNh (is) 

for all I G L and /i G H. 

If v G L <E> H and /i G H, we define the product vh G L by the requirement 

= (vh,l\ , V/' G L 

(the validity of this definition follows from the Riesz-Fischer theorem: all bilinear 
forms in L ® H are clearly continuous). 

Lemma 2 For any I G L and hi,h,2 G H, 

{l®hx)hi = {hi,ha)nl. (20) 

Proof It suffices to prove 

((/®/h)/12,0l = C>^2)h(M')l, 

which, by definition, is equivalent to 

{l®h 1 ){l',h 2 ) = (/ii,Mh<U')l 
and, therefore, true (cf. I|17j l). I 
The following lemma is an easy implication of the Cauchy-Schwarz inequality. 
Lemma 3 For any veL®H and h G H, 

II^Hl < IMIl®h||«||h- 

Proof We are required to prove, for all V G L, 

W) L < IIvII^hII^IIhII^IIl, 
K^M<I|w||l®h||/i||h||/'|| l . 

We can assume that v — I ® h' , for some / G L and ft,' G H, in which case the 
last inequality immediately follows from i|17[) . I|19[) . and the Cauchy-Schwarz 
inequality. I 
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